Skip to content

Multi video backend#2153

Open
delexagon wants to merge 11 commits into
codeforboston:mainfrom
delexagon:multi-video-backend
Open

Multi video backend#2153
delexagon wants to merge 11 commits into
codeforboston:mainfrom
delexagon:multi-video-backend

Conversation

@delexagon

@delexagon delexagon commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Added backend support for multiple video handling.

Changes:

Created an emulator for returning Assembly AI style transcripts when testing locally without setting ASSEMBLY_API_KEY.
Created a backfill function called backfillHearingVideoFormat.
Changed backfillHearingTranscriptions to support multiple videos.
Split video/Assembly AI work from HearingScraper/scrapeHearings into a different format called EventPostProcessor meant to update events after they have occurred, which is operated as a separate HearingPostProcessor/scrapeVideos.

Notes:

  • backfillHearingVideoFormat will convert the hearings into the new format
  • backfillHearingTranscriptions will fetch all videos for hearings
  • Interesting hearings to test:
    • 2709 has a video that has duplicate uploads, one labeled MASTER and the other labeled archive.
    • 2731 is like 2709, but one of the listed urls has a video of 2 hours of a "Missing File" screen.
    • 2858 has two seemingly identical videos which are also identically named with completely different URLs.
  • A list of all hearings known to have multiple videos up to hearing 5471 is [13, 14, 71, 91, 104, 138, 167, 187, 203, 214, 217, 292, 501, 680, 861, 2118, 2137, 2271, 2289, 2290, 2300, 2476, 2662, 2680, 2709, 2731, 2735, 2858, 2904, 2967, 3073, 3080, 3125, 3167, 3171, 3243, 3317, 3362, 3377, 3381, 3402, 3470, 3480, 3486, 3521, 3579, 3580, 3586, 3642, 3646, 3659, 3660, 3668, 3677, 3685, 3689, 3695, 3713, 3716, 3733, 3774, 3792, 3819, 3829, 3846, 3887, 3891, 3892, 3921, 3930, 3933, 3951, 3976, 3988, 4000, 4016, 4049, 4052, 4065, 4071, 4082, 4111, 4112, 4126, 4127, 4149, 4158, 4201, 4258, 4278, 4458, 4469, 4470, 4558, 4600, 4612, 4641, 4699, 4709, 4711, 4734, 4777, 4847, 4880, 5099, 5173, 5207, 5362, 5382, 5441, 5465, 5471].
  • Assembly AI is connected externally only if the environment variable ASSEMBLY_API_KEY has been set.
  • I don't think ${process.env.FUNCTIONS_API_BASE}/transcription points to localhost:5001 in the emulator, so I set it manually. Maybe it should be more generalized.
  • The new ballotquestions pages seem to reference the videoURLs, but not use them.

Checklist

  • If I've added new Firestore queries, I've added any new required indexes to firestore.indexes.json (Please do not only create indexes through the Firebase Web UI, even though the error messages may reccommend it - indexes created this way may be obliterated by subsequent deploys) - I do not believe this is relevant? I have not changed firestore.indexes.json.

Known issues

Not tested full pipeline for bucket creation->Assembly AI, it's worth testing in full.
Not tested the ballot ids page; needs testing on dev.

Steps to test/reproduce

  1. Test backfillHearingVideoFormat (yarn firebase-admin run-script backfillHearingVideoFormat --env local)
  2. Test backfillHearingTranscription for all hearings (yarn firebase-admin run-script backfillHearingTranscription --env local) and for specific hearings (yarn firebase-admin run-script backfillHearingTranscription --env local --eventId 4258) that exist in the database. Test that rerunning this function without --recreateTranscripts does not create new transcriptIds and vice versa.
  3. Test the functions scrapeSingleHearing and scrapeSingleHearingv2
curl -X POST 'http://localhost:5001/demo-dtp/us-central1/scrapeSingleHearingv2' \
  -H "Content-Type: application/json" \
  -d '{"data": { "eventId": 3713 }}'
  1. Test pubsub functions (curl 'http://localhost:5001/demo-dtp/us-central1/triggerPubsubFunction?scheduled=scrapeHearings') (curl 'http://localhost:5001/demo-dtp/us-central1/triggerPubsubFunction?scheduled=scrapeVideos')
  2. Test that hearing indexing is functional
  3. Test that Assembly AI is interpreted properly after changing ASSEMBLY_API_KEY in functions/.secret.local
  4. Test migrateHearingTranscription (After conversion of dev)
  5. Check that whatever the heck the ballot ids page is doing hasn't been broken

Conversion process

Run yarn firebase-admin run-script backfillHearingVideoFormat (This might be too much at once?)
Run yarn firebase-admin run-script backfillHearingTranscription (This runs in batches, I think) - env var ASSEMBLY_API_KEY must be set

@vercel

vercel Bot commented Jun 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
maple-dev Ready Ready Preview, Comment Jun 10, 2026 1:40am

Request Review

@delexagon delexagon marked this pull request as ready for review June 10, 2026 00:15
@delexagon delexagon marked this pull request as draft June 10, 2026 00:40
@delexagon delexagon marked this pull request as ready for review June 10, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant